{
 "nbformat_minor": 1, 
 "nbformat": 4, 
 "cells": [
  {
   "source": [
    "# Generative Adversarial Networks (GANs)\n", 
    "\n", 
    "So far in CS231N, all the applications of neural networks that we have explored have been **discriminative models** that take an input and are trained to produce a labeled output. This has ranged from straightforward classification of image categories to sentence generation (which was still phrased as a classification problem, our labels were in vocabulary space and we\u2019d learned a recurrence to capture multi-word labels). In this notebook, we will expand our repetoire, and build **generative models** using neural networks. Specifically, we will learn how to build models which generate novel images that resemble a set of training images.\n", 
    "\n", 
    "### What is a GAN?\n", 
    "\n", 
    "In 2014, [Goodfellow et al.](https://arxiv.org/abs/1406.2661) presented a method for training generative models called Generative Adversarial Networks (GANs for short). In a GAN, we build two different neural networks. Our first network is a traditional classification network, called the **discriminator**. We will train the discriminator to take images, and classify them as being real (belonging to the training set) or fake (not present in the training set). Our other network, called the **generator**, will take random noise as input and transform it using a neural network to produce images. The goal of the generator is to fool the discriminator into thinking the images it produced are real.\n", 
    "\n", 
    "We can think of this back and forth process of the generator ($G$) trying to fool the discriminator ($D$), and the discriminator trying to correctly classify real vs. fake as a minimax game:\n", 
    "$$\\underset{G}{\\text{minimize}}\\; \\underset{D}{\\text{maximize}}\\; \\mathbb{E}_{x \\sim p_\\text{data}}\\left[\\log D(x)\\right] + \\mathbb{E}_{z \\sim p(z)}\\left[\\log \\left(1-D(G(z))\\right)\\right]$$\n", 
    "where $z \\sim p(z)$ are the random noise samples, $G(z)$ are the generated images using the neural network generator $G$, and $D$ is the output of the discriminator, specifying the probability of an input being real. In [Goodfellow et al.](https://arxiv.org/abs/1406.2661), they analyze this minimax game and show how it relates to minimizing the Jensen-Shannon divergence between the training data distribution and the generated samples from $G$.\n", 
    "\n", 
    "To optimize this minimax game, we will aternate between taking gradient *descent* steps on the objective for $G$, and gradient *ascent* steps on the objective for $D$:\n", 
    "1. update the **generator** ($G$) to minimize the probability of the __discriminator making the correct choice__. \n", 
    "2. update the **discriminator** ($D$) to maximize the probability of the __discriminator making the correct choice__.\n", 
    "\n", 
    "While these updates are useful for analysis, they do not perform well in practice. Instead, we will use a different objective when we update the generator: maximize the probability of the **discriminator making the incorrect choice**. This small change helps to allevaiate problems with the generator gradient vanishing when the discriminator is confident. This is the standard update used in most GAN papers, and was used in the original paper from [Goodfellow et al.](https://arxiv.org/abs/1406.2661). \n", 
    "\n", 
    "In this assignment, we will alternate the following updates:\n", 
    "1. Update the generator ($G$) to maximize the probability of the discriminator making the incorrect choice on generated data:\n", 
    "$$\\underset{G}{\\text{maximize}}\\;  \\mathbb{E}_{z \\sim p(z)}\\left[\\log D(G(z))\\right]$$\n", 
    "2. Update the discriminator ($D$), to maximize the probability of the discriminator making the correct choice on real and generated data:\n", 
    "$$\\underset{D}{\\text{maximize}}\\; \\mathbb{E}_{x \\sim p_\\text{data}}\\left[\\log D(x)\\right] + \\mathbb{E}_{z \\sim p(z)}\\left[\\log \\left(1-D(G(z))\\right)\\right]$$\n", 
    "\n", 
    "### What else is there?\n", 
    "Since 2014, GANs have exploded into a huge research area, with massive [workshops](https://sites.google.com/site/nips2016adversarial/), and [hundreds of new papers](https://github.com/hindupuravinash/the-gan-zoo). Compared to other approaches for generative models, they often produce the highest quality samples but are some of the most difficult and finicky models to train (see [this github repo](https://github.com/soumith/ganhacks) that contains a set of 17 hacks that are useful for getting models working). Improving the stabiilty and robustness of GAN training is an open research question, with new papers coming out every day! For a more recent tutorial on GANs, see [here](https://arxiv.org/abs/1701.00160). There is also some even more recent exciting work that changes the objective function to Wasserstein distance and yields much more stable results across model architectures: [WGAN](https://arxiv.org/abs/1701.07875), [WGAN-GP](https://arxiv.org/abs/1704.00028).\n", 
    "\n", 
    "\n", 
    "GANs are not the only way to train a generative model! For other approaches to generative modeling check out the [deep generative model chapter](http://www.deeplearningbook.org/contents/generative_models.html) of the Deep Learning [book](http://www.deeplearningbook.org). Another popular way of training neural networks as generative models is Variational Autoencoders (co-discovered [here](https://arxiv.org/abs/1312.6114) and [here](https://arxiv.org/abs/1401.4082)). Variatonal autoencoders combine neural networks with variationl inference to train deep generative models. These models tend to be far more stable and easier to train but currently don't produce samples that are as pretty as GANs.\n", 
    "\n", 
    "Here's an example of what your outputs from the 3 different models you're going to train should look like... note that GANs are sometimes finicky, so your outputs might not look exactly like this... this is just meant to be a *rough* guideline of the kind of quality you can expect:\n", 
    "\n", 
    "![caption](gan_outputs_pytorch.png)"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "source": [
    "## Setup"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "import torch\n", 
    "import torch.nn as nn\n", 
    "from torch.nn import init\n", 
    "import torchvision\n", 
    "import torchvision.transforms as T\n", 
    "import torch.optim as optim\n", 
    "from torch.utils.data import DataLoader\n", 
    "from torch.utils.data import sampler\n", 
    "import torchvision.datasets as dset\n", 
    "\n", 
    "import numpy as np\n", 
    "\n", 
    "import matplotlib.pyplot as plt\n", 
    "import matplotlib.gridspec as gridspec\n", 
    "\n", 
    "%matplotlib inline\n", 
    "plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n", 
    "plt.rcParams['image.interpolation'] = 'nearest'\n", 
    "plt.rcParams['image.cmap'] = 'gray'\n", 
    "\n", 
    "def show_images(images):\n", 
    "    images = np.reshape(images, [images.shape[0], -1])  # images reshape to (batch_size, D)\n", 
    "    sqrtn = int(np.ceil(np.sqrt(images.shape[0])))\n", 
    "    sqrtimg = int(np.ceil(np.sqrt(images.shape[1])))\n", 
    "\n", 
    "    fig = plt.figure(figsize=(sqrtn, sqrtn))\n", 
    "    gs = gridspec.GridSpec(sqrtn, sqrtn)\n", 
    "    gs.update(wspace=0.05, hspace=0.05)\n", 
    "\n", 
    "    for i, img in enumerate(images):\n", 
    "        ax = plt.subplot(gs[i])\n", 
    "        plt.axis('off')\n", 
    "        ax.set_xticklabels([])\n", 
    "        ax.set_yticklabels([])\n", 
    "        ax.set_aspect('equal')\n", 
    "        plt.imshow(img.reshape([sqrtimg,sqrtimg]))\n", 
    "    return \n", 
    "\n", 
    "def preprocess_img(x):\n", 
    "    return 2 * x - 1.0\n", 
    "\n", 
    "def deprocess_img(x):\n", 
    "    return (x + 1.0) / 2.0\n", 
    "\n", 
    "def rel_error(x,y):\n", 
    "    return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))\n", 
    "\n", 
    "def count_params(model):\n", 
    "    \"\"\"Count the number of parameters in the current TensorFlow graph \"\"\"\n", 
    "    param_count = np.sum([np.prod(p.size()) for p in model.parameters()])\n", 
    "    return param_count\n", 
    "\n", 
    "answers = dict(np.load('gan-checks-tf.npz'))"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": true
   }
  }, 
  {
   "source": [
    "## Dataset\n", 
    " GANs are notoriously finicky with hyperparameters, and also require many training epochs. In order to make this assignment approachable without a GPU, we will be working on the MNIST dataset, which is 60,000 training and 10,000 test images. Each picture contains a centered image of white digit on black background (0 through 9). This was one of the first datasets used to train convolutional neural networks and it is fairly easy -- a standard CNN model can easily exceed 99% accuracy. \n", 
    "\n", 
    "To simplify our code here, we will use the PyTorch MNIST wrapper, which downloads and loads the MNIST dataset. See the [documentation](https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py) for more information about the interface. The default parameters will take 5,000 of the training examples and place them into a validation dataset. The data will be saved into a folder called `MNIST_data`. "
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "class ChunkSampler(sampler.Sampler):\n", 
    "    \"\"\"Samples elements sequentially from some offset. \n", 
    "    Arguments:\n", 
    "        num_samples: # of desired datapoints\n", 
    "        start: offset where we should start selecting from\n", 
    "    \"\"\"\n", 
    "    def __init__(self, num_samples, start=0):\n", 
    "        self.num_samples = num_samples\n", 
    "        self.start = start\n", 
    "\n", 
    "    def __iter__(self):\n", 
    "        return iter(range(self.start, self.start + self.num_samples))\n", 
    "\n", 
    "    def __len__(self):\n", 
    "        return self.num_samples\n", 
    "\n", 
    "NUM_TRAIN = 50000\n", 
    "NUM_VAL = 5000\n", 
    "\n", 
    "NOISE_DIM = 96\n", 
    "batch_size = 128\n", 
    "\n", 
    "mnist_train = dset.MNIST('./cs231n/datasets/MNIST_data', train=True, download=True,\n", 
    "                           transform=T.ToTensor())\n", 
    "loader_train = DataLoader(mnist_train, batch_size=batch_size,\n", 
    "                          sampler=ChunkSampler(NUM_TRAIN, 0))\n", 
    "\n", 
    "mnist_val = dset.MNIST('./cs231n/datasets/MNIST_data', train=True, download=True,\n", 
    "                           transform=T.ToTensor())\n", 
    "loader_val = DataLoader(mnist_val, batch_size=batch_size,\n", 
    "                        sampler=ChunkSampler(NUM_VAL, NUM_TRAIN))\n", 
    "\n", 
    "\n", 
    "imgs = loader_train.__iter__().next()[0].view(batch_size, 784).numpy().squeeze()\n", 
    "show_images(imgs)"
   ], 
   "outputs": [], 
   "metadata": {
    "scrolled": false, 
    "collapsed": false
   }
  }, 
  {
   "source": [
    "## Random Noise\n", 
    "Generate uniform noise from -1 to 1 with shape `[batch_size, dim]`.\n", 
    "\n", 
    "Hint: use `torch.rand`."
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def sample_noise(batch_size, dim):\n", 
    "    \"\"\"\n", 
    "    Generate a PyTorch Tensor of uniform random noise.\n", 
    "\n", 
    "    Input:\n", 
    "    - batch_size: Integer giving the batch size of noise to generate.\n", 
    "    - dim: Integer giving the dimension of noise to generate.\n", 
    "    \n", 
    "    Output:\n", 
    "    - A PyTorch Tensor of shape (batch_size, dim) containing uniform\n", 
    "      random noise in the range (-1, 1).\n", 
    "    \"\"\"\n", 
    "    pass\n"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": true
   }
  }, 
  {
   "source": [
    "Make sure noise is the correct shape and type:"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def test_sample_noise():\n", 
    "    batch_size = 3\n", 
    "    dim = 4\n", 
    "    torch.manual_seed(231)\n", 
    "    z = sample_noise(batch_size, dim)\n", 
    "    np_z = z.cpu().numpy()\n", 
    "    assert np_z.shape == (batch_size, dim)\n", 
    "    assert torch.is_tensor(z)\n", 
    "    assert np.all(np_z >= -1.0) and np.all(np_z <= 1.0)\n", 
    "    assert np.any(np_z < 0.0) and np.any(np_z > 0.0)\n", 
    "    print('All tests passed!')\n", 
    "    \n", 
    "test_sample_noise()"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": false
   }
  }, 
  {
   "source": [
    "## Flatten\n", 
    "\n", 
    "Recall our Flatten operation from previous notebooks... this time we also provide an Unflatten, which you might want to use when implementing the convolutional generator. We also provide a weight initializer (and call it for you) that uses Xavier initialization instead of PyTorch's uniform default."
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "class Flatten(nn.Module):\n", 
    "    def forward(self, x):\n", 
    "        N, C, H, W = x.size() # read in N, C, H, W\n", 
    "        return x.view(N, -1)  # \"flatten\" the C * H * W values into a single vector per image\n", 
    "    \n", 
    "class Unflatten(nn.Module):\n", 
    "    \"\"\"\n", 
    "    An Unflatten module receives an input of shape (N, C*H*W) and reshapes it\n", 
    "    to produce an output of shape (N, C, H, W).\n", 
    "    \"\"\"\n", 
    "    def __init__(self, N=-1, C=128, H=7, W=7):\n", 
    "        super(Unflatten, self).__init__()\n", 
    "        self.N = N\n", 
    "        self.C = C\n", 
    "        self.H = H\n", 
    "        self.W = W\n", 
    "    def forward(self, x):\n", 
    "        return x.view(self.N, self.C, self.H, self.W)\n", 
    "\n", 
    "def initialize_weights(m):\n", 
    "    if isinstance(m, nn.Linear) or isinstance(m, nn.ConvTranspose2d):\n", 
    "        init.xavier_uniform_(m.weight.data)"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": true
   }
  }, 
  {
   "source": [
    "## CPU / GPU\n", 
    "By default all code will run on CPU. GPUs are not needed for this assignment, but will help you to train your models faster. If you do want to run the code on a GPU, then change the `dtype` variable in the following cell."
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "dtype = torch.FloatTensor\n", 
    "#dtype = torch.cuda.FloatTensor ## UNCOMMENT THIS LINE IF YOU'RE ON A GPU!"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": true
   }
  }, 
  {
   "source": [
    "# Discriminator\n", 
    "Our first step is to build a discriminator. Fill in the architecture as part of the `nn.Sequential` constructor in the function below. All fully connected layers should include bias terms. The architecture is:\n", 
    " * Fully connected layer with input size 784 and output size 256\n", 
    " * LeakyReLU with alpha 0.01\n", 
    " * Fully connected layer with input_size 256 and output size 256\n", 
    " * LeakyReLU with alpha 0.01\n", 
    " * Fully connected layer with input size 256 and output size 1\n", 
    " \n", 
    "Recall that the Leaky ReLU nonlinearity computes $f(x) = \\max(\\alpha x, x)$ for some fixed constant $\\alpha$; for the LeakyReLU nonlinearities in the architecture above we set $\\alpha=0.01$.\n", 
    " \n", 
    "The output of the discriminator should have shape `[batch_size, 1]`, and contain real numbers corresponding to the scores that each of the `batch_size` inputs is a real image."
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def discriminator():\n", 
    "    \"\"\"\n", 
    "    Build and return a PyTorch model implementing the architecture above.\n", 
    "    \"\"\"\n", 
    "    model = nn.Sequential(\n", 
    "    )\n", 
    "    return model"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": true
   }
  }, 
  {
   "source": [
    "Test to make sure the number of parameters in the discriminator is correct:"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def test_discriminator(true_count=267009):\n", 
    "    model = discriminator()\n", 
    "    cur_count = count_params(model)\n", 
    "    if cur_count != true_count:\n", 
    "        print('Incorrect number of parameters in discriminator. Check your achitecture.')\n", 
    "    else:\n", 
    "        print('Correct number of parameters in discriminator.')     \n", 
    "\n", 
    "test_discriminator()"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": false
   }
  }, 
  {
   "source": [
    "# Generator\n", 
    "Now to build the generator network:\n", 
    " * Fully connected layer from noise_dim to 1024\n", 
    " * `ReLU`\n", 
    " * Fully connected layer with size 1024 \n", 
    " * `ReLU`\n", 
    " * Fully connected layer with size 784\n", 
    " * `TanH` (to clip the image to be in the range of [-1,1])"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def generator(noise_dim=NOISE_DIM):\n", 
    "    \"\"\"\n", 
    "    Build and return a PyTorch model implementing the architecture above.\n", 
    "    \"\"\"\n", 
    "    model = nn.Sequential(\n", 
    "    )\n", 
    "    return model"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": true
   }
  }, 
  {
   "source": [
    "Test to make sure the number of parameters in the generator is correct:"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def test_generator(true_count=1858320):\n", 
    "    model = generator(4)\n", 
    "    cur_count = count_params(model)\n", 
    "    if cur_count != true_count:\n", 
    "        print('Incorrect number of parameters in generator. Check your achitecture.')\n", 
    "    else:\n", 
    "        print('Correct number of parameters in generator.')\n", 
    "\n", 
    "test_generator()"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": false
   }
  }, 
  {
   "source": [
    "# GAN Loss\n", 
    "\n", 
    "Compute the generator and discriminator loss. The generator loss is:\n", 
    "$$\\ell_G  =  -\\mathbb{E}_{z \\sim p(z)}\\left[\\log D(G(z))\\right]$$\n", 
    "and the discriminator loss is:\n", 
    "$$ \\ell_D = -\\mathbb{E}_{x \\sim p_\\text{data}}\\left[\\log D(x)\\right] - \\mathbb{E}_{z \\sim p(z)}\\left[\\log \\left(1-D(G(z))\\right)\\right]$$\n", 
    "Note that these are negated from the equations presented earlier as we will be *minimizing* these losses.\n", 
    "\n", 
    "**HINTS**: You should use the `bce_loss` function defined below to compute the binary cross entropy loss which is needed to compute the log probability of the true label given the logits output from the discriminator. Given a score $s\\in\\mathbb{R}$ and a label $y\\in\\{0, 1\\}$, the binary cross entropy loss is\n", 
    "\n", 
    "$$ bce(s, y) = -y * \\log(s) - (1 - y) * \\log(1 - s) $$\n", 
    "\n", 
    "A naive implementation of this formula can be numerically unstable, so we have provided a numerically stable implementation for you below.\n", 
    "\n", 
    "You will also need to compute labels corresponding to real or fake and use the logit arguments to determine their size. Make sure you cast these labels to the correct data type using the global `dtype` variable, for example:\n", 
    "\n", 
    "\n", 
    "`true_labels = torch.ones(size).type(dtype)`\n", 
    "\n", 
    "Instead of computing the expectation of $\\log D(G(z))$, $\\log D(x)$ and $\\log \\left(1-D(G(z))\\right)$, we will be averaging over elements of the minibatch, so make sure to combine the loss by averaging instead of summing."
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def bce_loss(input, target):\n", 
    "    \"\"\"\n", 
    "    Numerically stable version of the binary cross-entropy loss function.\n", 
    "\n", 
    "    As per https://github.com/pytorch/pytorch/issues/751\n", 
    "    See the TensorFlow docs for a derivation of this formula:\n", 
    "    https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits\n", 
    "\n", 
    "    Inputs:\n", 
    "    - input: PyTorch Tensor of shape (N, ) giving scores.\n", 
    "    - target: PyTorch Tensor of shape (N,) containing 0 and 1 giving targets.\n", 
    "\n", 
    "    Returns:\n", 
    "    - A PyTorch Tensor containing the mean BCE loss over the minibatch of input data.\n", 
    "    \"\"\"\n", 
    "    neg_abs = - input.abs()\n", 
    "    loss = input.clamp(min=0) - input * target + (1 + neg_abs.exp()).log()\n", 
    "    return loss.mean()"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": true
   }
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def discriminator_loss(logits_real, logits_fake):\n", 
    "    \"\"\"\n", 
    "    Computes the discriminator loss described above.\n", 
    "    \n", 
    "    Inputs:\n", 
    "    - logits_real: PyTorch Tensor of shape (N,) giving scores for the real data.\n", 
    "    - logits_fake: PyTorch Tensor of shape (N,) giving scores for the fake data.\n", 
    "    \n", 
    "    Returns:\n", 
    "    - loss: PyTorch Tensor containing (scalar) the loss for the discriminator.\n", 
    "    \"\"\"\n", 
    "    loss = None\n", 
    "    return loss\n", 
    "\n", 
    "def generator_loss(logits_fake):\n", 
    "    \"\"\"\n", 
    "    Computes the generator loss described above.\n", 
    "\n", 
    "    Inputs:\n", 
    "    - logits_fake: PyTorch Tensor of shape (N,) giving scores for the fake data.\n", 
    "    \n", 
    "    Returns:\n", 
    "    - loss: PyTorch Tensor containing the (scalar) loss for the generator.\n", 
    "    \"\"\"\n", 
    "    loss = None\n", 
    "    return loss"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": true
   }
  }, 
  {
   "source": [
    "Test your generator and discriminator loss. You should see errors < 1e-7."
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def test_discriminator_loss(logits_real, logits_fake, d_loss_true):\n", 
    "    d_loss = discriminator_loss(torch.Tensor(logits_real).type(dtype),\n", 
    "                                torch.Tensor(logits_fake).type(dtype)).cpu().numpy()\n", 
    "    print(\"Maximum error in d_loss: %g\"%rel_error(d_loss_true, d_loss))\n", 
    "\n", 
    "test_discriminator_loss(answers['logits_real'], answers['logits_fake'],\n", 
    "                        answers['d_loss_true'])"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": false
   }
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def test_generator_loss(logits_fake, g_loss_true):\n", 
    "    g_loss = generator_loss(torch.Tensor(logits_fake).type(dtype)).cpu().numpy()\n", 
    "    print(\"Maximum error in g_loss: %g\"%rel_error(g_loss_true, g_loss))\n", 
    "\n", 
    "test_generator_loss(answers['logits_fake'], answers['g_loss_true'])"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": false
   }
  }, 
  {
   "source": [
    "# Optimizing our loss\n", 
    "Make a function that returns an `optim.Adam` optimizer for the given model with a 1e-3 learning rate, beta1=0.5, beta2=0.999. You'll use this to construct optimizers for the generators and discriminators for the rest of the notebook."
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def get_optimizer(model):\n", 
    "    \"\"\"\n", 
    "    Construct and return an Adam optimizer for the model with learning rate 1e-3,\n", 
    "    beta1=0.5, and beta2=0.999.\n", 
    "    \n", 
    "    Input:\n", 
    "    - model: A PyTorch model that we want to optimize.\n", 
    "    \n", 
    "    Returns:\n", 
    "    - An Adam optimizer for the model with the desired hyperparameters.\n", 
    "    \"\"\"\n", 
    "    optimizer = None\n", 
    "    return optimizer"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": true
   }
  }, 
  {
   "source": [
    "# Training a GAN!\n", 
    "\n", 
    "We provide you the main training loop... you won't need to change this function, but we encourage you to read through and understand it. "
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def run_a_gan(D, G, D_solver, G_solver, discriminator_loss, generator_loss, show_every=250, \n", 
    "              batch_size=128, noise_size=96, num_epochs=10):\n", 
    "    \"\"\"\n", 
    "    Train a GAN!\n", 
    "    \n", 
    "    Inputs:\n", 
    "    - D, G: PyTorch models for the discriminator and generator\n", 
    "    - D_solver, G_solver: torch.optim Optimizers to use for training the\n", 
    "      discriminator and generator.\n", 
    "    - discriminator_loss, generator_loss: Functions to use for computing the generator and\n", 
    "      discriminator loss, respectively.\n", 
    "    - show_every: Show samples after every show_every iterations.\n", 
    "    - batch_size: Batch size to use for training.\n", 
    "    - noise_size: Dimension of the noise to use as input to the generator.\n", 
    "    - num_epochs: Number of epochs over the training dataset to use for training.\n", 
    "    \"\"\"\n", 
    "    iter_count = 0\n", 
    "    for epoch in range(num_epochs):\n", 
    "        for x, _ in loader_train:\n", 
    "            if len(x) != batch_size:\n", 
    "                continue\n", 
    "            D_solver.zero_grad()\n", 
    "            real_data = x.type(dtype)\n", 
    "            logits_real = D(2* (real_data - 0.5)).type(dtype)\n", 
    "\n", 
    "            g_fake_seed = sample_noise(batch_size, noise_size).type(dtype)\n", 
    "            fake_images = G(g_fake_seed).detach()\n", 
    "            logits_fake = D(fake_images.view(batch_size, 1, 28, 28))\n", 
    "\n", 
    "            d_total_error = discriminator_loss(logits_real, logits_fake)\n", 
    "            d_total_error.backward()        \n", 
    "            D_solver.step()\n", 
    "\n", 
    "            G_solver.zero_grad()\n", 
    "            g_fake_seed = sample_noise(batch_size, noise_size).type(dtype)\n", 
    "            fake_images = G(g_fake_seed)\n", 
    "\n", 
    "            gen_logits_fake = D(fake_images.view(batch_size, 1, 28, 28))\n", 
    "            g_error = generator_loss(gen_logits_fake)\n", 
    "            g_error.backward()\n", 
    "            G_solver.step()\n", 
    "\n", 
    "            if (iter_count % show_every == 0):\n", 
    "                print('Iter: {}, D: {:.4}, G:{:.4}'.format(iter_count,d_total_error.item(),g_error.item()))\n", 
    "                imgs_numpy = fake_images.data.cpu().numpy()\n", 
    "                show_images(imgs_numpy[0:16])\n", 
    "                plt.show()\n", 
    "                print()\n", 
    "            iter_count += 1"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": true
   }
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "# Make the discriminator\n", 
    "D = discriminator().type(dtype)\n", 
    "\n", 
    "# Make the generator\n", 
    "G = generator().type(dtype)\n", 
    "\n", 
    "# Use the function you wrote earlier to get optimizers for the Discriminator and the Generator\n", 
    "D_solver = get_optimizer(D)\n", 
    "G_solver = get_optimizer(G)\n", 
    "# Run it!\n", 
    "run_a_gan(D, G, D_solver, G_solver, discriminator_loss, generator_loss)"
   ], 
   "outputs": [], 
   "metadata": {
    "scrolled": true, 
    "collapsed": false
   }
  }, 
  {
   "source": [
    "Well that wasn't so hard, was it? In the iterations in the low 100s you should see black backgrounds, fuzzy shapes as you approach iteration 1000, and decent shapes, about half of which will be sharp and clearly recognizable as we pass 3000."
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "source": [
    "# Least Squares GAN\n", 
    "We'll now look at [Least Squares GAN](https://arxiv.org/abs/1611.04076), a newer, more stable alernative to the original GAN loss function. For this part, all we have to do is change the loss function and retrain the model. We'll implement equation (9) in the paper, with the generator loss:\n", 
    "$$\\ell_G  =  \\frac{1}{2}\\mathbb{E}_{z \\sim p(z)}\\left[\\left(D(G(z))-1\\right)^2\\right]$$\n", 
    "and the discriminator loss:\n", 
    "$$ \\ell_D = \\frac{1}{2}\\mathbb{E}_{x \\sim p_\\text{data}}\\left[\\left(D(x)-1\\right)^2\\right] + \\frac{1}{2}\\mathbb{E}_{z \\sim p(z)}\\left[ \\left(D(G(z))\\right)^2\\right]$$\n", 
    "\n", 
    "\n", 
    "**HINTS**: Instead of computing the expectation, we will be averaging over elements of the minibatch, so make sure to combine the loss by averaging instead of summing. When plugging in for $D(x)$ and $D(G(z))$ use the direct output from the discriminator (`scores_real` and `scores_fake`)."
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def ls_discriminator_loss(scores_real, scores_fake):\n", 
    "    \"\"\"\n", 
    "    Compute the Least-Squares GAN loss for the discriminator.\n", 
    "    \n", 
    "    Inputs:\n", 
    "    - scores_real: PyTorch Tensor of shape (N,) giving scores for the real data.\n", 
    "    - scores_fake: PyTorch Tensor of shape (N,) giving scores for the fake data.\n", 
    "    \n", 
    "    Outputs:\n", 
    "    - loss: A PyTorch Tensor containing the loss.\n", 
    "    \"\"\"\n", 
    "    loss = None\n", 
    "    return loss\n", 
    "\n", 
    "def ls_generator_loss(scores_fake):\n", 
    "    \"\"\"\n", 
    "    Computes the Least-Squares GAN loss for the generator.\n", 
    "    \n", 
    "    Inputs:\n", 
    "    - scores_fake: PyTorch Tensor of shape (N,) giving scores for the fake data.\n", 
    "    \n", 
    "    Outputs:\n", 
    "    - loss: A PyTorch Tensor containing the loss.\n", 
    "    \"\"\"\n", 
    "    loss = None\n", 
    "    return loss"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": true
   }
  }, 
  {
   "source": [
    "Before running a GAN with our new loss function, let's check it:"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def test_lsgan_loss(score_real, score_fake, d_loss_true, g_loss_true):\n", 
    "    score_real = torch.Tensor(score_real).type(dtype)\n", 
    "    score_fake = torch.Tensor(score_fake).type(dtype)\n", 
    "    d_loss = ls_discriminator_loss(score_real, score_fake).cpu().numpy()\n", 
    "    g_loss = ls_generator_loss(score_fake).cpu().numpy()\n", 
    "    print(\"Maximum error in d_loss: %g\"%rel_error(d_loss_true, d_loss))\n", 
    "    print(\"Maximum error in g_loss: %g\"%rel_error(g_loss_true, g_loss))\n", 
    "\n", 
    "test_lsgan_loss(answers['logits_real'], answers['logits_fake'],\n", 
    "                answers['d_loss_lsgan_true'], answers['g_loss_lsgan_true'])"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": false
   }
  }, 
  {
   "source": [
    "Run the following cell to train your model!"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "D_LS = discriminator().type(dtype)\n", 
    "G_LS = generator().type(dtype)\n", 
    "\n", 
    "D_LS_solver = get_optimizer(D_LS)\n", 
    "G_LS_solver = get_optimizer(G_LS)\n", 
    "\n", 
    "run_a_gan(D_LS, G_LS, D_LS_solver, G_LS_solver, ls_discriminator_loss, ls_generator_loss)"
   ], 
   "outputs": [], 
   "metadata": {
    "scrolled": false, 
    "collapsed": false
   }
  }, 
  {
   "source": [
    "# Deeply Convolutional GANs\n", 
    "In the first part of the notebook, we implemented an almost direct copy of the original GAN network from Ian Goodfellow. However, this network architecture allows no real spatial reasoning. It is unable to reason about things like \"sharp edges\" in general because it lacks any convolutional layers. Thus, in this section, we will implement some of the ideas from [DCGAN](https://arxiv.org/abs/1511.06434), where we use convolutional networks \n", 
    "\n", 
    "#### Discriminator\n", 
    "We will use a discriminator inspired by the TensorFlow MNIST classification tutorial, which is able to get above 99% accuracy on the MNIST dataset fairly quickly. \n", 
    "* Reshape into image tensor (Use Unflatten!)\n", 
    "* Conv2D: 32 Filters, 5x5, Stride 1\n", 
    "* Leaky ReLU(alpha=0.01)\n", 
    "* Max Pool 2x2, Stride 2\n", 
    "* Conv2D: 64 Filters, 5x5, Stride 1\n", 
    "* Leaky ReLU(alpha=0.01)\n", 
    "* Max Pool 2x2, Stride 2\n", 
    "* Flatten\n", 
    "* Fully Connected with output size 4 x 4 x 64\n", 
    "* Leaky ReLU(alpha=0.01)\n", 
    "* Fully Connected with output size 1"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def build_dc_classifier():\n", 
    "    \"\"\"\n", 
    "    Build and return a PyTorch model for the DCGAN discriminator implementing\n", 
    "    the architecture above.\n", 
    "    \"\"\"\n", 
    "    return nn.Sequential(\n", 
    "        ###########################\n", 
    "        ######### TO DO ###########\n", 
    "        ###########################\n", 
    "        Unflatten(batch_size, 1, 28, 28),\n", 
    "    )\n", 
    "\n", 
    "data = next(enumerate(loader_train))[-1][0].type(dtype)\n", 
    "b = build_dc_classifier().type(dtype)\n", 
    "out = b(data)\n", 
    "print(out.size())"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": false
   }
  }, 
  {
   "source": [
    "Check the number of parameters in your classifier as a sanity check:"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def test_dc_classifer(true_count=1102721):\n", 
    "    model = build_dc_classifier()\n", 
    "    cur_count = count_params(model)\n", 
    "    if cur_count != true_count:\n", 
    "        print('Incorrect number of parameters in generator. Check your achitecture.')\n", 
    "    else:\n", 
    "        print('Correct number of parameters in generator.')\n", 
    "\n", 
    "test_dc_classifer()"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": false
   }
  }, 
  {
   "source": [
    "#### Generator\n", 
    "For the generator, we will copy the architecture exactly from the [InfoGAN paper](https://arxiv.org/pdf/1606.03657.pdf). See Appendix C.1 MNIST. See the documentation for [tf.nn.conv2d_transpose](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d_transpose). We are always \"training\" in GAN mode. \n", 
    "* Fully connected with output size 1024\n", 
    "* `ReLU`\n", 
    "* BatchNorm\n", 
    "* Fully connected with output size 7 x 7 x 128 \n", 
    "* ReLU\n", 
    "* BatchNorm\n", 
    "* Reshape into Image Tensor of shape 7, 7, 128\n", 
    "* Conv2D^T (Transpose): 64 filters of 4x4, stride 2, 'same' padding\n", 
    "* `ReLU`\n", 
    "* BatchNorm\n", 
    "* Conv2D^T (Transpose): 1 filter of 4x4, stride 2, 'same' padding\n", 
    "* `TanH`\n", 
    "* Should have a 28x28x1 image, reshape back into 784 vector"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def build_dc_generator(noise_dim=NOISE_DIM):\n", 
    "    \"\"\"\n", 
    "    Build and return a PyTorch model implementing the DCGAN generator using\n", 
    "    the architecture described above.\n", 
    "    \"\"\"\n", 
    "    return nn.Sequential(\n", 
    "        ###########################\n", 
    "        ######### TO DO ###########\n", 
    "        ###########################\n", 
    "    )\n", 
    "\n", 
    "test_g_gan = build_dc_generator().type(dtype)\n", 
    "test_g_gan.apply(initialize_weights)\n", 
    "\n", 
    "fake_seed = torch.randn(batch_size, NOISE_DIM).type(dtype)\n", 
    "fake_images = test_g_gan.forward(fake_seed)\n", 
    "fake_images.size()"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": false
   }
  }, 
  {
   "source": [
    "Check the number of parameters in your generator as a sanity check:"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "def test_dc_generator(true_count=6580801):\n", 
    "    model = build_dc_generator(4)\n", 
    "    cur_count = count_params(model)\n", 
    "    if cur_count != true_count:\n", 
    "        print('Incorrect number of parameters in generator. Check your achitecture.')\n", 
    "    else:\n", 
    "        print('Correct number of parameters in generator.')\n", 
    "\n", 
    "test_dc_generator()"
   ], 
   "outputs": [], 
   "metadata": {
    "collapsed": false
   }
  }, 
  {
   "execution_count": null, 
   "cell_type": "code", 
   "source": [
    "D_DC = build_dc_classifier().type(dtype) \n", 
    "D_DC.apply(initialize_weights)\n", 
    "G_DC = build_dc_generator().type(dtype)\n", 
    "G_DC.apply(initialize_weights)\n", 
    "\n", 
    "D_DC_solver = get_optimizer(D_DC)\n", 
    "G_DC_solver = get_optimizer(G_DC)\n", 
    "\n", 
    "run_a_gan(D_DC, G_DC, D_DC_solver, G_DC_solver, discriminator_loss, generator_loss, num_epochs=5)"
   ], 
   "outputs": [], 
   "metadata": {
    "scrolled": false, 
    "collapsed": false
   }
  }, 
  {
   "source": [
    "## INLINE QUESTION 1\n", 
    "\n", 
    "We will look at an example to see why alternating minimization of the same objective (like in a GAN) can be tricky business.\n", 
    "\n", 
    "Consider $f(x,y)=xy$. What does $\\min_x\\max_y f(x,y)$ evaluate to? (Hint: minmax tries to minimize the maximum value achievable.)\n", 
    "\n", 
    "Now try to evaluate this function numerically for 6 steps, starting at the point $(1,1)$, \n", 
    "by using alternating gradient (first updating y, then updating x) with step size $1$. \n", 
    "You'll find that writing out the update step in terms of $x_t,y_t,x_{t+1},y_{t+1}$ will be useful.\n", 
    "\n", 
    "Record the six pairs of explicit values for $(x_t,y_t)$ in the table below."
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "source": [
    "### Your answer:\n", 
    " \n", 
    " $y_0$ | $y_1$ | $y_2$ | $y_3$ | $y_4$ | $y_5$ | $y_6$ \n", 
    " ----- | ----- | ----- | ----- | ----- | ----- | ----- \n", 
    "   1   |       |       |       |       |       |       \n", 
    " $x_0$ | $x_1$ | $x_2$ | $x_3$ | $x_4$ | $x_5$ | $x_6$ \n", 
    "   1   |       |       |       |       |       |       \n", 
    "   "
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "source": [
    "## INLINE QUESTION 2\n", 
    "Using this method, will we ever reach the optimal value? Why or why not?"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "source": [
    "### Your answer:"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "source": [
    "## INLINE QUESTION 3\n", 
    "If the generator loss decreases during training while the discriminator loss stays at a constant high value from the start, is this a good sign? Why or why not? A qualitative answer is sufficient"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }, 
  {
   "source": [
    "### Your answer:"
   ], 
   "cell_type": "markdown", 
   "metadata": {}
  }
 ], 
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3", 
   "name": "python3", 
   "language": "python"
  }, 
  "language_info": {
   "mimetype": "text/x-python", 
   "nbconvert_exporter": "python", 
   "name": "python", 
   "file_extension": ".py", 
   "version": "3.6.1", 
   "pygments_lexer": "ipython3", 
   "codemirror_mode": {
    "version": 3, 
    "name": "ipython"
   }
  }
 }
}